Budgeted Bandit Problems with Continuous Random Costs
نویسندگان
چکیده
We study the budgeted bandit problem, where each arm is associated with both a reward and a cost. In a budgeted bandit problem, the objective is to design an arm pulling algorithm in order to maximize the total reward before the budget runs out. In this work, we study both multi-armed bandits and linear bandits, and focus on the setting with continuous random costs. We propose an upper confidence bound based algorithm for multi-armed bandits and a confidence ball based algorithm for linear bandits, and prove logarithmic regret bounds for both algorithms. We conduct simulations on the proposed algorithms, which verify the effectiveness of our proposed algorithms.
منابع مشابه
Finite budget analysis of multi-armed bandit problems
In the budgeted multi-armed bandit (MAB) problem, a player receives a random reward and needs to pay a cost after pulling an arm, and he cannot pull any more arm after running out of budget. In this paper, we give an extensive study of the upper confidence bound based algorithms and a greedy algorithm for budgeted MABs. We perform theoretical analysis on the proposed algorithms, and show that t...
متن کاملBudgeted Learning, Part I: The Multi-Armed Bandit Case
We introduce and motivate the task of learning under a budget. We focus on a basic problem in this space: selecting the optimal bandit after a period of experimentation in a multi-armed bandit setting, where each experiment is costly, our total costs cannot exceed a fixed pre-specified budget, and there is no reward collection during the learning period. We address the computational complexity ...
متن کاملBudgeted Multi-Armed Bandits with Multiple Plays
We study the multi-play budgeted multi-armed bandit (MP-BMAB) problem, in which pulling an arm receives both a random reward and a random cost, and a player pulls L( 1) arms at each round. The player targets at maximizing her total expected reward under a budget constraint B for the pulling costs. We present a multiple ratio confidence bound policy: At each round, we first calculate a truncated...
متن کاملBudgeted Multi-Armed Bandit in Continuous Action Space
Multi–Armed Bandits (MABs) have been widely considered in the last decade to model settings in which an agent wants to learn the action providing the highest expected reward among a fixed set of available actions during the operational life of a system. Classical techniques provide solutions that minimize the regret due to learning in settings where selecting an arm has no cost. Though, in many...
متن کاملThompson Sampling for Budgeted Multi-Armed Bandits
Thompson sampling is one of the earliest randomized algorithms for multi-armed bandits (MAB). In this paper, we extend the Thompson sampling to Budgeted MAB, where there is random cost for pulling an arm and the total cost is constrained by a budget. We start with the case of Bernoulli bandits, in which the random rewards (costs) of an arm are independently sampled from a Bernoulli distribution...
متن کامل